Denote:
Two outcomes is like flipping a coin!
Denote:
where \(0\leq \theta \leq 1\). We can represent in the following probability distribution:
\[\text{Pr}(y_i=y) = \theta^y(1-\theta)^{1-y} \]
which is known as the Bernoulli distribution:
\[\begin{equation} y_i \sim \text{Bernoulli}(\theta) \end{equation}\]
Suppose you flip coin twice: \(y_1=1\), \(y_2=0\). Assuming independence:
\[\begin{equation} \text{Pr}(y_1=1,y_2=0|\theta) = \theta \times (1-\theta). \end{equation}\]
We call \(L(\theta)=\text{Pr}(y_1=1,y_2=0|\theta)\) a likelihood.
Want to choose \(\theta\) to maximise probability of obtaining those results
Find maximum by differentiating:
\[\begin{equation} \frac{d L}{d\theta} = 1 - 2 \theta = 0 \end{equation}\]
Rearranging, we obtain:
\[\begin{equation} \theta = \frac{1}{2} \end{equation}\]
In logistic regression, we use logistic function:
\[\begin{equation} \theta = \frac{1}{1 + \exp (-x)} \end{equation}\]
We want to estimate how sensitive presence / absence of lung cancer is to tar, so model probability:
\[\begin{equation} \theta_i = f_\beta(x_i) := \frac{1}{1 + \exp (-(\beta_0 + \beta_1 x_i))} \end{equation}\]
which is known as logistic regression and assume:
\[\begin{equation} y_i \sim \text{Bernoulli}(\theta_i) \end{equation}\]
Data for one individual \((x_i,y_i)\) have probability:
\[\text{Pr}(y_i=y) = f_\beta(x_i)^y(1-f_\beta(x_i))^{1-y} \]
Suppose we have data \((x_1,y_1=1)\) and \((x_2,y_2=0)\).
Assume data are:
Then overall probability is just product of individual:
\[\begin{array} L &= f_\beta(x_1)^{y_1} (1-f_\beta(x_1))^{1-y_1} f_\beta(x_2)^{y_2}(1-f_\beta(x_2))^{1-y_2}\\ &= f_\beta(x_1) (1-f_\beta(x_2)) \end{array}\]Same logic applies under i.i.d. assumption:
\[\begin{equation} L = \prod_{i=1}^{K} f_\beta(x_i)^{y_i} (1 - f_\beta(x_i))^{1 - y_i} \end{equation}\]
Unlike the simple coin flipping case, there is no analytic solution to the maximum likelihood estimates. Instead, do gradient descent:
\[\begin{align} \beta_0 &= \beta_0 - \eta \frac{\partial L}{\partial \beta_0}\\ \beta_1 &= \beta_1 - \eta \frac{\partial L}{\partial \beta_1} \end{align}\]
where \(\eta>0\) is the learning rate.
Suppose we estimate that \(\beta_0=-1\) and \(\beta_1=2\). What do these mean?
\[\begin{equation} \theta_i = \frac{1}{1 + \exp (-(-1 + 2 x_i))} \end{equation}\]
so impact of incremental changes in \(x_i\) on the probability of lung cancer is nonlinear
\[\begin{equation} \theta_i = \frac{1}{1 + \exp (-(-1 + 2 x_i))} \end{equation}\]
meaning
\[\begin{equation} 1-\theta_i = \frac{\exp (-(-1 + 2 x_i))}{1 + \exp (-(-1 + 2 x_i))} \end{equation}\]
The ratio of probability of lung cancer to probability of cancer-free is called odds:
\[\begin{align} \frac{\theta_i}{1-\theta_i} &=\exp (-1 + 2 x_i) \end{align}\]
so here \(\exp 2\approx 7.4\) gives the change to the odds for a one unit change in x_i. Because of this, \(\exp \beta_1\) is known as the odds ratio for that variable
Taking log of both sides:
\[\begin{equation} \log \frac{\theta_i}{1-\theta_i} = -1 + 2 x_i \end{equation}\]
so we see that \(\beta_1=2\) effectively gives the change to the log-odds for a one unit change in \(x_i\).
straightforward to extend the model to incorporate multiple regressions:
\[\begin{equation} f_\beta(x_i) := \frac{1}{1 + \exp (-(\beta_0 + \beta_1 x_{1,i} + ... + \beta_p x_{p,i}))} \end{equation}\]